perm filename A36.TEX[106,PHY] blob sn#807735 filedate 1985-11-05 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00011 ENDMK
CāŠ—;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a36.tex[106,phy] \today\hfill}

\bigskip
\line{\bf Case Study: Table Searching.\hfil}

\bigskip
Suppose we have a large array, where the elements are of an ordered type
({\tt REAL} or {\tt INTEGER} or a string type, for example), in which
we want to find a particular value. Perhaps the array contains a list of
stolen or fraudulent credit card numbers, and we want to see if a
proffered card is in the list.

If the items in the array are placed there without regard to their order,
there is no alternative to examining each element until an equality is
found:

{\obeylines\obeyspaces\let =\ \tt
        L:=1;
        WHILE (L<=NUMBER) AND CARD<>BADCARD[L] DO L:=L+1;
        IF L<=NUMBER THEN WRITE('CARD IS BAD')
        ELSE WRITE('CARD IS GOOD')
}

The above algorithm examines in {\tt BADCARD[1]} to {\tt BADCARD[NUMBER+1]},
so the array declaration must accomodate at least this range to avoid a
common run time error halt. When {\tt L}~is in the range {\tt 1..NUMBER},
the {\tt WHILE} statement only halts when {\tt CARD=BADCARD[L]},
i.e., the card is in the list. When {\tt L}~reaches the value
{\tt NUMBER+1}, however, the {\tt WHILE} statement stops because
{\tt L<=NUMBER} is not true.

By
copying {\tt CARD} into {\tt BADCARD[NUMBER+1]} as a sentinel, the
iteration can be simplified:

{\obeylines\obeyspaces\let =\ \tt
        L:=1; BADCARD[NUMBER+1]:=CARD;
        WHILE CARD<>BADCARD[L] DO L:=L+1;
        IF L<=NUMBER THEN WRITE('CARD IS BAD')
        ELSE WRITE('CARD IS GOOD')
}

This is a common theme in simplifying indefinite iterations with several
different stopping conditions: look for a way to make one of the
conditions (in this case {\tt CARD=BADCARD[L]}) become true when the
other (in the case {\tt L>NUMBER}) does.

If the array has been arranged in increasing order of the data items
stored in it, it becomes possible to eliminate large portions of the
array at once. If {\tt CARD>BADCARD[L]}, then also 
{\tt CARD>BADCARD[1],...,BADCARD[L-1]}, 
so only {\tt BADCARD[L+1],...,BADCARD[NUMBER]} 
remain to be searched. On the other side, if
{\tt CARD<=BADCARD[L]}, only {\tt BADCARD[1],...,BADCARD[L]}
need be searched. By looking near the center of the part of the array we
are searching, the size of the remaining task is halved at each iteration;
because halving a million twenty times yields (roughly) one,
in twenty tests we can search a table containing a~million items.

To keep track of the part of the array that remains to be searched,
we use variables {\tt LEFT} and {\tt RIGHT}, where
{\tt BADCARD[LEFT+1],...,BADCARD[RIGHT]} is the remaining
region to search.
This unsymmetric convention reflects the invariant that we have previously
found {\tt CARD>BADCARD[LEFT]} and {\tt CARD<=BADCARD[RIGHT]}, so
{\tt CARD} could be in {\tt BADCARD[RIGHT]} but is not in {\tt BADCARD[LEFT]}.
In Pascal:

\vfill\eject

{\obeylines\obeyspaces\let =\ \tt
        LEFT:=0; RIGHT:=NUMBER;
        WHILE RIGHT-LEFT (* SIZE TO SEARCH *) >1 DO
            BEGIN
            MIDDLE:=(LEFT+RIGHT) DIV 2;
            IF CARD>BADCARD[MIDDLE] THEN
                LEFT:=MIDDLE
            ELSE RIGHT:=MIDDLE;
            END;
        IF CARD=BADCARD[RIGHT] THEN WRITE('CARD IS BAD')
        ELSE WRITE('CARD IS GOOD').
}

\noindent Drill.  If the condition inside the iteration were
{\tt IF CARD>=BADCARD[MIDDLE]},
what other changes would need to be made to the program?

\smallskip
\noindent Drill.  Which indices of {\tt BADCARD} can the program
use, assuming {\tt NUMBER>0}? What if {\tt NUMBER=0}? What is the final value
of {\tt RIGHT-LEFT}?

\smallskip
\noindent Drill.  Instead of testing just whether {\tt CARD>BADCARD[MIDDLE]},
we could make a three-way test, like

{\obeylines\obeyspaces\let =\ \tt
        IF CARD>BADCARD[MIDDLE] THEN S1
        ELSE IF CARD=BADCARD[MIDDLE] THEN S2
        ELSE S3.
}

\noindent
What would be the best convention for {\tt LEFT} and {\tt RIGHT}?
On the average, would the program make more or fewer comparisons?

\smallskip\noindent
Drill. Would there be any advantage in arranging the array in decreasing
order?

\smallskip\noindent
Drill. Find the bug in this program:

{\obeylines\obeyspaces\let =\ \tt
        LEFT:=0; RIGHT:=NUMBER;
        WHILE LEFT<RIGHT DO
            BEGIN
            MIDDLE:=(LEFT+RIGHT) DIV 2;
            IF CARD>BADCARD[MIDDLE]
            THEN LEFT:=MIDDLE
            ELSE RIGHT:=MIDDLE;
            END;
        IF CARD=BADCARD[RIGHT]
        THEN WRITE('CARD IS BAD')
        ELSE WRITE('CARD IS GOOD')
}

\vfill\eject

%\smallskip
\noindent
Drill. The following program is designed for one particular size of sorted table,
namely~13. Does it work?

{\obeylines\obeyspaces\let =\ \tt
        LEFT:=0;
        IF CARD>BADCARD[5] THEN LEFT:=5;
        IF CARD>BADCARD[LEFT+4]
            THEN LEFT:=LEFT+4;
        IF CARD>BADCARD[LEFT+2]
            THEN LEFT:=LEFT+2;
        IF CARD>BADCARD[LEFT+1]
            THEN LEFT:=LEFT+1;
        IF CARD=BADCARD[LEFT+1]
            THEN WRITE('CARD IS BAD')
            ELSE WRITE('CARD IS GOOD')
}






\bigskip
\parindent0pt
\copyright 1984 Robert W. Floyd;
First draft September 24, 1984

\bye